Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing

نویسندگان

Elizabeth Shriberg

Andreas Stolcke

چکیده

We describe a “direct modeling” approach to using prosody in various speech technology tasks. The approach does not involve any hand-labeling or modeling of prosodic events such as pitch accents or boundary tones. Instead, prosodic features are extracted directly from the speech signal and from the output of an automatic speech recognizer. Machine learning techniques then determine a prosodic model, which is integrated with lexical and other information to predict the target classes of interest. We discuss task-specific modeling and results for a line of research covering four general application areas: (1) structural tagging (finding sentence boundaries, disfluencies), (2) pragmatic and paralinguistic tagging (classifying dialog acts, emotion, and “hot spots”), (3) speaker recognition, and (4) word recognition itself. To provide an idea of performance on realworld data, we focus on spontaneous (rather than read or acted) speech from a variety of contexts—including human-human telephone conversations, game-playing, human-computer dialog, and multi-party meetings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosody Modeling for Automatic Speech Understanding: An Overview of Recent Research at SRI

Prosody has long been studied as an important knowledge source for speech understanding. In recent years there has been a large amount of computational work aimed at prosodic modeling for automatic speech recognition and understanding. Whereas most current approaches to speech processing model only the words, prosody provides an additional knowledge source that is inherent in, and exclusive to,...

متن کامل

A general-purpose 32 ms prosodic vector for hidden Markov modeling

Prosody plays a central role in conversation, making it important for speech technologies to model. Unfortunately, the application of standard modeling techniques to the acoustics of prosody has been hindered by difficulties in modeling intonation. In this work, we explore the suitability of the recently introduced fundamental frequency variation (FFV) spectrum as a candidate general representa...

متن کامل

Automatic labeling of Japanese prosody using j-toBI style description

Speech corpora with prosodic labels are getting more and more important not only for speech synthesis but also for discourse modeling. A widely used labeling system for Japanese prosody, J-ToBI, however, is insufficient for applications like discourse modeling and it even lacks an accurate method for automatic labeling. In this paper, we propose an automatic labeling method for J-ToBI style des...

متن کامل

Prosody Modeling for Automatic Speech Recognition and Understanding

This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automati...

متن کامل

Prosodic models, automatic speech understanding, and speech synthesis: towards the common ground

Automatic speech understanding and speech synthesis, two of the major speech processing applications, impose strikingly different constraints and requirements on prosodic models. The prevalent models of prosody and intonation fail to offer a unified solution to these conflicting constraints. As a consequence, prosodic models have been applied only occasionally in end-toend automatic speech unde...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing

نویسندگان

چکیده

منابع مشابه

Prosody Modeling for Automatic Speech Understanding: An Overview of Recent Research at SRI

A general-purpose 32 ms prosodic vector for hidden Markov modeling

Automatic labeling of Japanese prosody using j-toBI style description

Prosody Modeling for Automatic Speech Recognition and Understanding

Prosodic models, automatic speech understanding, and speech synthesis: towards the common ground

عنوان ژورنال:

اشتراک گذاری